# load package(s)
library(tidyverse)
library(patchwork)
library(ggthemes)
library(sf)
# load steph curry data
steph_curry <- read_delim(
file = "data/stephen_curry_shotdata_2023_24.txt",
delim = "|"
) |>
janitor::clean_names()
# load ga election & map data
ga_data <- read_csv("data/ga_election_data.csv") |>
janitor::clean_names()
# load map data
load("data/ga_map.rda")Midterm
Data Visualization (STAT 302)
Overview
The midterm attempts to bring together everything you have learned to date. You’ll be asked to replicate a series of graphics to demonstrate your skills and provide short descriptions/explanations regarding issues and concepts in ggplot2.
You are free to use any resource at your disposal such as notes, past labs, the internet, fellow students, instructor, TA, etc. However, do not simply copy and paste solutions. This is a chance for you to assess how much you have learned and determine if you are developing practical data visualization skills and knowledge.
Datasets
The datasets used for this dataset are stephen_curry_shotdata_2023_24.txt, ga_election_data.csv, and ga_map.rda. We will also need the nbahalfcourt.jpg image.
Below you can find a short description of the variables contained in stephen_curry_shotdata_2023_24.txt:
GAME_ID- Unique ID for each game during the seasonHOME- Indicates if game is"Home"or"Away"PLAYER_ID- Unique player IDPLAYER_NAME- Player’s nameTEAM_ID- Unique team IDTEAM_NAME- Team name
PERIOD- Quarter or period of the gameMINUTES_REMAINING- Minutes remaining in quarter/periodSECONDS_REMAINING- Seconds remaining in quarter/periodEVENT_TYPE-Missed ShotorMade ShotSHOT_DISTANCE- Shot distance in feetLOC_X- X location of shot attempt according to tracking systemLOC_Y- Y location of shot attempt according to tracking system
The ga_election_data.csv dataset contains the state of Georgia’s county level results for the 2020 US presidential election. Here is a short description of the variables it contains:
County- name of county in GeorgiaCandidate- name of candidate on the ballot,Election Day Votes- number of votes cast on election day for a candidate within a countyAbsentee by Mail Votes- number of votes cast absentee by mail, pre-election day, for a candidate within a countyAdvanced Voting Votes- number of votes cast in-person, pre-election day, for a candidate within a countyProvisional Votes- number of votes cast on election day for a candidate within a county needing voter eligibility verificationTotal Votes- total number of votes for a candidate within a county
We have also included the map data for Georgia (ga_map.rda) which was retrieved using tigris::counties().
Exercise 1
Using the stephen_curry_shotdata_2023_24.txt dataset replicate, as close as possible, the graphics below (2 required, 1 optional/bonus). After replicating the graphics provide a summary of what the graphics indicate about Stephen Curry’s shot selection such as distance from hoop, shot make/miss rate, how do makes and misses compare across distance and game time (i.e. across quarters/periods).
Plot 1
Hints:
- Figure width 6 inches and height 4 inches, which is taken care of in code chunk yaml with
fig-widthandfig-height - Use
minimaltheme and adjust from there - Useful hex colors:
"#1D428A"and"#FFC72C80" - While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 12, 14, & 16
Plot 2
Hints:
- Figure width 6 inches and height 4 inches, which is taken care of in code chunk yaml with
fig-widthandfig-height - Use
minimaltheme and adjust from there - Useful hex colors:
"#5D3A9B"and"#E66100" - No padding on vertical axis
- Transparency is being used
annotate()is used to add labels- While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0, 0.035, 0.081, 0.09, 0.25, 4.5, 12, 14, 16, 27.5
Plot 3 — Optional/Bonus
Hints:
- Figure width 7 inches and height 7 inches, which is taken care of in code chunk yaml with
fig-widthandfig-height - Colors used:
"grey","red","orange""yellow"(don’t have to use"orange", you can get away with using only"red"and"yellow") - To set
15+as the highest value, you need to set the limits in the appropriate scale while also setting thena.valueto the top color - While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0, 0.7, 5, 12, 14, 15, 16, 20
Summary
Provide a summary of what the graphics above indicate about Stephen Curry’s shot selection such as distance from hoop, shot make/miss rate, how do makes and misses compare across distance and game time (i.e. across quarters/periods).
Exercise 2
Using the ga_election_data.csv dataset in conjunction with mapping data ga_map.rda replicate, as close as possible, the graphic below. Note the graphic is comprised of two plots displayed side-by-side. The plots both use the same shading scheme (i.e. scale limits and fill options).
Holding the 2020 US Presidential election during the COVID-19 pandemic was a massive logistical undertaking. Additional voter engagement was extremely historically high. Voting operations, headed by states, ran very smoothly and encountered few COVID-19 related issues. The state of Georgia did a particularly good job at this by encouraging their residents to use early voting. About 75% of the vote in a typical county voted early! Statewide, about 80% or 4 in every 5 voters in Georgia voted early.
While it is clear that early voting was the preferred option for Georgia voters, we want to investigate whether or not voters for one candidate/party utilized early voting more than the other — we are focusing on the two major candidates/parties. We created the graphic below to explore the relationship of voting modality and voter preference, which you are tasked with recreating.
Hints:
- Figure width 7 inches and height 7 inches, which is taken care of in code chunk yaml with
fig-widthandfig-height - Make two plots, then arrange plots accordingly using
patchworkpackage patchwork::plot_annotation()will be useful for adding graphic title and caption; you’ll also set the theme options for the graphic title and caption (think font size and face) — code has been providedggthemes::theme_map()was used as the base theme for the plotsscale_*_gradient2()will be helpful- Useful hex colors:
"#5D3A9B"and"#1AFF1A" - While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0.5, 0.75, 1, 10, 12, 14, 24
Plot
Summary
Provide a summary of how the two maps relate to one another. That is, what insight can we learn from the graphic.
Exercise 3
Question 1
Name and briefly describe the core concept/idea that ggplot2 package uses to build graphics.
Question 2
Explain the difference between using geom_bar() or geom_col() to make a bar plot.
Question 3
Explain aesthetic mappings and their purpose.
Question 4
What 2 core things do scales provide/control in ggplot2?